pkgs <- "
kthcorpus DT bslib htmltools dplyr downloadthis
"
import <- function(x)
x |> trimws() |> strsplit("\\s+") |> unlist() |>
lapply(function(x) library(x, character.only = TRUE)) |>
invisible()
pkgs |> import()Generate MODS from Scopus articles data
Scopus data retrieval
The Scopus APIs for publication search and extended abstracts data can be used to retrive metadata for Scopus publications.
Recent publications from KTH
Scopus data for KTH can be retrieved from Scopus APIs. This assumes environment variables for SCOPUS_API_KEY and SCOPUS_API_INSTTOKEN are available. These need to be present in the ~/.Renviron file. Requests counts towards a ratelimit quota, which can be checked using another function.
scopus <- scopus_search_pubs_kth()
scopus_ratelimit_quota()Due to the quota limit and since there is already a scheduled job providing the latest data, another better approach is to request the data from object storage.
scopus <- scopus_from_minio()Extended Abstract API data
Given a specific Scopus identifier for a publication, we can use a function to retrieve additional information including for example raw affiliation strings.
# use the first id
sid <- scopus$publications$`dc:identifier` |> head(1)
scopus_abstract_extended(sid)$scopus_abstract
# A tibble: 1 × 19
`dc:publisher` srctype prism…¹ prism…² sourc…³ cited…⁴ prism…⁵ subtype opena…⁶
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Elsevier B.V. j 2023-0… Journal 25349 0 872 ar 1
# … with 10 more variables: `prism:issn` <chr>, subtypeDescription <chr>,
# `prism:publicationName` <chr>, openaccessFlag <chr>, `prism:doi` <chr>,
# `dc:identifier` <chr>, lang <chr>, keywords <chr>, sid <chr>,
# `dc:description` <chr>, and abbreviated variable names ¹`prism:coverDate`,
# ²`prism:aggregationType`, ³`source-id`, ⁴`citedby-count`, ⁵`prism:volume`,
# ⁶openaccess
$scopus_authorgroup
# A tibble: 8 × 26
sid id i ce_gi…¹ prefe…² prefe…³ prefe…⁴ prefe…⁵ autho…⁶ seq
<chr> <chr> <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 SCOPUS_ID:8… 1 1 Md. Ah… Md Aha… M.A. Habib Habib … S00489… 1
2 SCOPUS_ID:8… 1 2 Prosun Prosun P. Bhatta… Bhatta… S00489… 5
3 SCOPUS_ID:8… 2 1 Md. Ah… Md Aha… M.A. Habib Habib … S00489… 1
4 SCOPUS_ID:8… 2 2 Md. Ab… Md Abd… M.A. Haque Haque … S00489… 3
5 SCOPUS_ID:8… 2 3 Md. Mi… Md Mir… M.M.A. Raihan Raihan… S00489… 4
6 SCOPUS_ID:8… 3 1 Serena Serena S. Coccio… Coccio… S00489… 2
7 SCOPUS_ID:8… 4 1 Anna Anna A. Tompse… Tompse… S00489… 6
8 SCOPUS_ID:8… 5 1 Anna Anna A. Tompse… Tompse… S00489… 6
# … with 16 more variables: ce_initials <chr>, fa <chr>, type <chr>,
# ce_surname <chr>, auid <chr>, ce_indexed_name <chr>, country <chr>,
# afid <chr>, country3 <chr>, city <chr>, organization <chr>,
# affiliation_id <chr>, affiliation_instance_id <chr>, ce_source_text <chr>,
# dptid <chr>, raw_org <chr>, and abbreviated variable names ¹ce_given_name,
# ²preferred_name_ce_given_name, ³preferred_name_ce_initials,
# ⁴preferred_name_ce_surname, ⁵preferred_name_ce_indexed_name, …
$scopus_correspondence
# A tibble: 1 × 5
sid ce_given_name ce_initials ce_surname ce_indexed_name
<chr> <chr> <chr> <chr> <chr>
1 SCOPUS_ID:85148543994 Md. Ahasan M.A. Habib Habib M.A.
ORCiDs versus KTH identfiers
In order to automatically look up known KTH identifiers for researchers (kthids) from ORCiDs, known associations can be made available so these are known up-front.
Note that this is not necessary since otherwise these are looked up on article by article basis. But it can be useful to speed up the process.
ko <- kthid_orcid()Generating MODS for articles
Different publication types require sligthly different kinds of MODS file content.
To work with Scopus articles, filter on the publication subtype, like so:
# subtype == "cp" # conference paper
# subtype == "ar" # article
# subtype == "ch" # book chapter
articles <- scopus$publications |> filter(subtype == "ar")To generate MODS for a specific article, we need first its Scopus identifier
sids <- articles$`dc:identifier`
sid <- sids |> head(1)
# we provide previous scopus search results and kthid_orcid pairs
# to avoid runtime lookups for this data
mods <- sid |> scopus_mods(scopus = scopus, ko = ko)
mods |> xml2::read_xml() |> as.character() |> cat()<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-2.xsd">
<mods xmlns="http://www.loc.gov/mods/v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink" version="3.7" xsi:schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-7.xsd">
<genre authority="diva" type="contentTypeCode">referee</genre>
<genre authority="diva" type="publicationTypeCode">article</genre>
<genre authority="svep" type="publicationType">art</genre>
<genre authority="diva" type="publicationType" lang="eng">Article in journal</genre>
<genre authority="kev" type="publicationType" lang="eng">article</genre>
<name type="personal" authority="kth" href="NA">
<namePart type="family">Habib</namePart>
<namePart type="given">Md. Ahasan</namePart>
<role>
<roleTerm type="code" authority="marcrelator">aut</roleTerm>
</role>
<affiliation><![CDATA[KTH-International Groundwater Arsenic Research Group, Department of Sustainable Development, Environmental Science and Engineering, KTH Royal Institute of Technology, Stockholm, Sweden; NGO Forum for Public Health, Dhaka, Bangladesh]]></affiliation>
</name>
<name type="personal" authority="kth" href="NA">
<namePart type="family">Cocciolo</namePart>
<namePart type="given">Serena</namePart>
<role>
<roleTerm type="code" authority="marcrelator">aut</roleTerm>
</role>
<affiliation><![CDATA[World Bank, Washington D.C., USA]]></affiliation>
</name>
<name type="personal" authority="kth" href="NA">
<namePart type="family">Haque</namePart>
<namePart type="given">Md. Abdul</namePart>
<role>
<roleTerm type="code" authority="marcrelator">aut</roleTerm>
</role>
<affiliation><![CDATA[NGO Forum for Public Health, Dhaka, Bangladesh]]></affiliation>
</name>
<name type="personal" authority="kth" href="NA">
<namePart type="family">Raihan</namePart>
<namePart type="given">Md. Mir Abu</namePart>
<role>
<roleTerm type="code" authority="marcrelator">aut</roleTerm>
</role>
<affiliation><![CDATA[NGO Forum for Public Health, Dhaka, Bangladesh]]></affiliation>
</name>
<name type="personal" authority="kth" href="NA">
<namePart type="family">Bhattacharya</namePart>
<namePart type="given">Prosun</namePart>
<role>
<roleTerm type="code" authority="marcrelator">aut</roleTerm>
</role>
<affiliation><![CDATA[KTH-International Groundwater Arsenic Research Group, Department of Sustainable Development, Environmental Science and Engineering, KTH Royal Institute of Technology, Stockholm, Sweden]]></affiliation>
</name>
<name type="personal" authority="kth" href="NA">
<namePart type="family">Tompsett</namePart>
<namePart type="given">Anna</namePart>
<role>
<roleTerm type="code" authority="marcrelator">aut</roleTerm>
</role>
<affiliation><![CDATA[Institute for International Economic Studies, Stockholm University, Sweden; Beijer Institute for Ecological Economics, Royal Academy of Sciences, Sweden]]></affiliation>
</name>
<titleInfo lang="eng">
<title>How to clean a tubewell: the effectiveness of three approaches in reducing coliform bacteria</title>
</titleInfo>
<originInfo>
<publisher>Elsevier B.V.</publisher>
<dateIssued>2023</dateIssued>
<dateOther type="availableFrom">10 May 2023</dateOther>
</originInfo>
<physicalDescription>
<form authority="marcform">print</form>
</physicalDescription>
<identifier type="doi">10.1016/j.scitotenv.2023.161932</identifier>
<identifier type="scopus">2-s2.0-85148543994</identifier>
<identifier type="eissn">18791026</identifier>
<identifier type="issn">00489697</identifier>
<identifier type="articleId">161932</identifier>
<typeOfResource>text</typeOfResource>
<location>
<url>https://api.elsevier.com/content/abstract/scopus_id/85148543994</url>
</location>
<subject lang="eng">
<topic>Cleaning/maintenance</topic>
<topic>Coliform bacteria</topic>
<topic>Deep tubewells</topic>
<topic>Disinfection</topic>
<topic>Drinking water</topic>
</subject>
<abstract lang="eng"><![CDATA[Access to safe drinking water in rural Bangladesh remains a perpetual challenge. Most households are exposed to either arsenic or faecal bacteria in their primary source of drinking water, usually a tubewell. Improving tubewell cleaning and maintenance practices might reduce exposure to faecal contamination at a potentially low cost, but whether current cleaning and maintenance practices are effective remains uncertain, as does the extent to which best practice approaches might improve water quality. We used a randomized experiment to evaluate how effectively three approaches to cleaning a tubewell improved water quality, measured by total coliforms and E. coli. The three approaches comprise the caretaker's usual standard of care and two best-practice approaches. One best-practice approach, disinfecting the well with a weak chlorine solution, consistently improved water quality. However, when caretakers cleaned the wells themselves, they followed few of the steps involved in the best-practice approaches, and water quality declined rather than improved, although the estimated declines are not consistently statistically significant. The results suggest that, while improvements to cleaning and maintenance practices might help reduce exposure to faecal contamination in drinking water in rural Bangladesh, achieving widespread adoption of more effective practices would require significant behavioural change.]]></abstract>
<note>Imported from Scopus. VERIFY.</note>
<relatedItem type="host">
<titleInfo>
<title>Science of the Total Environment</title>
</titleInfo>
<identifier type="issn">00489697</identifier>
<part>
<detail type="volume">
<number>872</number>
</detail>
<detail type="issue">
<number>NA</number>
</detail>
<extent>
<start>NA</start>
<end>NA</end>
</extent>
</part>
</relatedItem>
<!-- <note type="funder">@Funder@ [@project_number_from_funder@]</note> -->
</mods>
</modsCollection>
The scopus_mods_crawl() function is vectorised which means it can iterate over several Scopus identifiers
my_sids <- sids |> head(10)
my_mods <- my_sids |> scopus_mods_crawl(scopus = scopus, ko = ko)Generating MODS parameters for 10 identifiers...
Generating MODS based on parameters...
Returning 10 MODS
names(my_mods) [1] "SCOPUS_ID:85148543994" "SCOPUS_ID:85148537060" "SCOPUS_ID:85148534117"
[4] "SCOPUS_ID:85148499622" "SCOPUS_ID:85148520239" "SCOPUS_ID:85148532021"
[7] "SCOPUS_ID:85148504588" "SCOPUS_ID:85148528636" "SCOPUS_ID:85148524172"
[10] "SCOPUS_ID:85148505140"
my_mods$`SCOPUS_ID:85147171092` |> cat()A zip-file with the results can be generated, and included for download in a quarto doc.